Efficient Reductions for Imitation Learning Supplementary Material
نویسندگان
چکیده
Let i = Es∼di π∗ [eπ̂(s)] for i = 1, 2, . . . , T the expected 0-1 loss at time i of π̂, such that = 1 T ∑T i=1 i. Note that t corresponds to the probability that π̂ makes a mistake under distribution dπ∗ . Let pt represent the probability π̂ hasn’t made a mistake (w.r.t. π∗) in the first tstep, and dt the distribution of state π̂ is in at time t conditioned on the fact it hasn’t made a mistake so far. If dt represents the distribution of states at time t obtained by following π∗ but conditioned on the fact that π̂ made at least one mistake in the first t − 1 visited states. Then dπ∗ = pt−1dt + (1 − pt−1)dt. Now at time t, the expected cost of π̂ is at most 1 if it has made a mistake so far, or Es∼dt(Cπ̂(s)) if it hasn’t make a mistake yet. So J(π̂) ≤ ∑T t=1[pt−1Es∼dt(Cπ̂(s))+(1−pt−1)]. Let et and et represent the probability of mistake of π̂ in distribution dt and dt. Then Es∼dt(Cπ̂(s)) ≤ Es∼dt(Cπ∗(s))+et, and since t = pt−1et + (1− pt−1)et, then pt−1et ≤ t. Additionnally since pt = (1 − et)pt−1, pt ≥ pt−1 − t ≥ 1 − ∑t i=1 i, i.e. 1− pt ≤ ∑t i=1 i. Finally note that J(π ∗) = ∑T t=1[pt−1Es∼dt(Cπ∗(s))+(1−pt−1)Es∼d′t(Cπ∗(s))], so that ∑T t=1 pt−1Es∼dt(Cπ∗(s)) ≤ J(π∗). Using these facts we obtain: J(π̂)≤ ∑T t=1[pt−1Es∼dt(Cπ̂(s)) + (1− pt−1)] ≤ ∑T t=1[pt−1Es∼dt(Cπ∗(s)) + pt−1et + (1− pt−1)] ≤ J(π∗) + ∑T t=1 ∑t i=1 i ≤ J(π∗) + T ∑T t=1 t = J(π∗) + T 2
منابع مشابه
Active lmitation learning: formal and practical reductions to I.I.D. learning
In standard passive imitation learning, the goal is to learn a policy that performs as well as a target policy by passively observing full execution trajectories of it. Unfortunately, generating such trajectories can require substantial expert effort and be impractical in some cases. In this paper, we consider active imitation learning with the goal of reducing this effort by querying the exper...
متن کاملEfficient Reductions for Imitation Learning
Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d.. This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a r...
متن کاملActive Imitation Learning via Reduction to I.I.D. Active Learning
In standard passive imitation learning, the goal is to learn a target policy by passively observing full execution trajectories of it. Unfortunately, generating such trajectories can require substantial expert effort and be impractical in some cases. In this paper, we consider active imitation learning with the goal of reducing this effort by querying the expert about the desired action at indi...
متن کاملBrain and psychological mediators of imitation: sociocultural versus physical traits
The acquisition of cultural beliefs and practices is fundamental to human societies. The psychological and neural mechanisms underlying cultural acquisition, however, are not well understood. Here we used brain imaging to investigate how others’ physical and sociocultural attributes may influence imitative learning, a critical component of cultural acquisition. While undergoing fMRI, 17 Europea...
متن کاملStructure Learning in Bayesian Networks of Moderate Size by Efficient Sampling ( Supplementary Material )
متن کامل
A Probabilistic Framework for Model-Based Imitation Learning
Humans and animals use imitation as a mechanism for acquiring knowledge. Recently, several algorithms and models have been proposed for imitation learning in robots and humans. However, few proposals offer a framework for imitation learning in a stochastic environment where the imitator must learn and act under realtime performance constraints. We present a probabilistic framework for imitation...
متن کامل